MG ITALS: GLM (hg38)

Start date: 03-01-2021

End date: 03-01-2021

Analysed by: Ruth Chia

Working directory: /data/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals/Analysis.GLM.hg38.rerun

Notes about this rerun: Needed to remove some controls samples from dbGAP because they no longer are available for use.


In [1]:
!pwd
/gpfs/gsfs11/users/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals/Analysis.GLM.hg38.rerun

QUICK GUIDE TO THE FILES I NEED HERE MG ITALS GLM ANALYSIS

  1. CLEAN raw genotype file for related+unrelated samples:
    • /data/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals/CLEAN.rawGenotype.UNRELATED/FILTERED.Itals_mg_noDups.UNRELATED.hwe1e-10.geno005.bed/bim/fam
  2. QCed + Imputed dosage vcfs:
    • /data/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals/Imputation.hg38/chr${CHNUM}.dose.vcf.gz
    • /data/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals/Imputation.hg38/chr${CHNUM}.dose.vcf.gz.tbi
  3. List of variants to keep for analysis:
    • /data/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals/Imputation.hg38/Vars.Rsq03MAF00001/Rsq03MAF00001.chr${CHNUM}.txt
  4. Covariate files for UNRELATED samples: /data/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals/COVARIATES.Itals_mg_noDups.UNRELATED.txt
  5. Covariates to adjust for: GENDER,age_at_onset,PC1,PC2,PC3,PC4,PC5,PC6,PC10

Run association analysis using plink2 --glm

In [3]:
%%bash

DATA="/data/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals"
COVAR="GENDER,age_at_onset,PC1,PC2,PC3,PC4,PC5,PC6,PC10"

for CHNUM in {1..22};
  do
echo "plink \
--vcf $DATA/Imputation.hg38/chr${CHNUM}.dose.vcf.gz \
--extract $DATA/Imputation.hg38/Vars.Rsq03MAF00001/Rsq03MAF00001.chr${CHNUM}.txt \
--double-id \
--pheno-name PHENO \
--pheno ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt \
--covar ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt \
--glm hide-covar firth-fallback cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err \
--out MG.ITALS.noDups.UNRELATED.hg38.glm_chr${CHNUM} \
--covar-name $COVAR \
--covar-variance-standardize" >> GWAS.MG.swarm
done

swarm --file GWAS.MG.swarm \
--logdir swarmOE_GWAS.LBD \
--gres=lscratch:800 \
-t 32 --partition quick -g 120 --time 02:00:00 \
--module plink/2.0-dev-20191128 \
--sbatch '--constraint=ibfdr'
9359587
In [5]:
# View glm log file
!cat MG.ITALS.noDups.UNRELATED.hg38.glm_chr1.log
PLINK v2.00a2LM 64-bit Intel (28 Nov 2019)
Options in effect:
  --covar ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt
  --covar-name GENDER,age_at_onset,PC1,PC2,PC3,PC4,PC5,PC6,PC10
  --covar-variance-standardize
  --double-id
  --extract /data/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals/Imputation.hg38/Vars.Rsq03MAF00001/Rsq03MAF00001.chr1.txt
  --glm hide-covar firth-fallback cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err
  --out MG.ITALS.noDups.UNRELATED.hg38.glm_chr1
  --pheno ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt
  --pheno-name PHENO
  --vcf /data/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals/Imputation.hg38/chr1.dose.vcf.gz

Hostname: cn2872
Working directory: /gpfs/gsfs11/users/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals/Analysis.GLM.hg38.rerun
Start time: Mon Mar  1 17:27:13 2021

Random number seed: 1614637633
128639 MiB RAM detected; reserving 64319 MiB for main workspace.
Using up to 40 threads (change this with --threads).
--vcf: 23617900 variants scanned.
--vcf: MG.ITALS.noDups.UNRELATED.hg38.glm_chr1-temporary.pgen +
MG.ITALS.noDups.UNRELATED.hg38.glm_chr1-temporary.pvar +
MG.ITALS.noDups.UNRELATED.hg38.glm_chr1-temporary.psam written.
3497 samples (0 females, 0 males, 3497 ambiguous; 3497 founders) loaded from
MG.ITALS.noDups.UNRELATED.hg38.glm_chr1-temporary.psam.
23617900 variants loaded from
MG.ITALS.noDups.UNRELATED.hg38.glm_chr1-temporary.pvar.
1 binary phenotype loaded (919 cases, 2478 controls).
--extract: 3472175 variants remaining.
9 covariates loaded from ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt.
--covar-variance-standardize: 9 covariates transformed.
Calculating allele frequencies... done.
3472175 variants remaining after main filters.
--glm logistic-Firth hybrid regression on phenotype 'PHENO': done.
Results written to MG.ITALS.noDups.UNRELATED.hg38.glm_chr1.PHENO.glm.logistic.hybrid .

End time: Mon Mar  1 17:46:05 2021

Merge logistic per chr files to single file

then filter by maf(cases): 0.001, 0.01, 0.05

In [6]:
%%bash

head -n 1 MG.ITALS.noDups.UNRELATED.hg38.glm_chr1.PHENO.glm.logistic.hybrid > MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03MAF00001_ALLchr.txt

for CHNUM in {1..22}
do
tail -n +2 MG.ITALS.noDups.UNRELATED.hg38.glm_chr${CHNUM}.PHENO.glm.logistic.hybrid >> MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03MAF00001_ALLchr.txt
done

## Filter by maf (cases)
awk 'NR==1;NR>1 {if($14 >= 0.001) print}' MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03MAF00001_ALLchr.txt > MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf0001cases_ALLchr.txt
awk 'NR==1;NR>1 {if($14 >= 0.01) print}' MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf0001cases_ALLchr.txt > MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases_ALLchr.txt
awk 'NR==1;NR>1 {if($14 >= 0.05) print}' MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases_ALLchr.txt > MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf005cases_ALLchr.txt

awk 'NR==1;NR>1 {if($13 >= 0.001) print}' MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03MAF00001_ALLchr.txt > MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf0001_ALLchr.txt
awk 'NR==1;NR>1 {if($13 >= 0.01) print}' MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf0001_ALLchr.txt > MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001_ALLchr.txt
awk 'NR==1;NR>1 {if($13 >= 0.05) print}' MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001_ALLchr.txt > MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf005_ALLchr.txt

wc -l MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03MAF00001_ALLchr.txt
wc -l MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf00*_ALLchr.txt
43429888 MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03MAF00001_ALLchr.txt
  14874808 MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf0001_ALLchr.txt
  16525268 MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf0001cases_ALLchr.txt
   8672788 MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001_ALLchr.txt
   8669467 MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases_ALLchr.txt
   6179866 MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf005_ALLchr.txt
   6196219 MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf005cases_ALLchr.txt
  61118416 total
In [7]:
%%bash
# Remove single files
rm MG.ITALS.noDups.UNRELATED.hg38.glm_chr*.PHENO.glm.logistic.hybrid
rm MG.ITALS.noDups.UNRELATED.hg38.glm_chr*.log

Plot Manhattan and QQ

In [8]:
%%bash
# check header of covariate file
head -n 1 ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt | tr '\t' '\n' | cat -n
     1	FID
     2	IID
     3	MAT
     4	PAT
     5	SEX
     6	PHENO
     7	genderONFILE
     8	age_at_onset
     9	diagnosis
    10	PC1
    11	PC2
    12	PC3
    13	PC4
    14	PC5
    15	PC6
    16	PC7
    17	PC8
    18	PC9
    19	PC10
    20	GENDER
In [9]:
%%bash
awk 'NR==1;NR>1 {if($8 != "NA") print}' ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt | awk 'NR==1;NR>1 {if($20 != "NA") print}' | cut -f6 | sort | uniq -c
   2413 1
    909 2
      1 PHENO
In [2]:
%%bash
echo "Breakdown of controls by sex:"
awk 'NR==1;NR>1 {if($8 != "NA" && $20 != "NA" && $6 == 1) print}' ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt | cut -f5 | sort | uniq -c


echo "Breakdown of cases by sex:"
awk 'NR==1;NR>1 {if($8 != "NA" && $20 != "NA" && $6 == 2) print}' ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt | cut -f5 | sort | uniq -c
Breakdown of controls by sex:
   1388 1
   1025 2
      1 SEX
Breakdown of cases by sex:
    445 1
    464 2
      1 SEX
In [10]:
%%bash
echo "Number of case alleles:"
cut -f11 MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf005_ALLchr.txt | sort | uniq -c

echo "Number of control alleles:"
cut -f12 MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf005_ALLchr.txt | sort | uniq -c
Number of case alleles:
6179865 1818
      1 CASE_ALLELE_CT
Number of control alleles:
6179865 4826
      1 CTRL_ALLELE_CT
In [11]:
%%bash

CASES="909"
CONTROLS="2413"

echo "Rscript ../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases_ALLchr.txt ${CASES} ${CONTROLS} MG.ITALS.unrelated.Rsq03.maf001cases.hg38.glm-dosage" > plot.swarm
echo "Rscript ../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf005cases_ALLchr.txt ${CASES} ${CONTROLS} MG.ITALS.unrelated.Rsq03.maf005cases.hg38.glm-dosage" >> plot.swarm

echo "Rscript ../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001_ALLchr.txt ${CASES} ${CONTROLS} MG.ITALS.unrelated.Rsq03.maf001.hg38.glm-dosage" >> plot.swarm
echo "Rscript ../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf005_ALLchr.txt ${CASES} ${CONTROLS} MG.ITALS.unrelated.Rsq03.maf005.hg38.glm-dosage" >> plot.swarm

swarm --file plot.swarm --logdir swarmOE_plot --module R/3.5.2 --time 04:00:00 -g 120
9362727
In [14]:
%%bash

CASES="909"
CONTROLS="2413"

echo "Rscript ../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.hwe1e-6_ALLchr.txt ${CASES} ${CONTROLS} MG.ITALS.unrelated.Rsq03.maf001cases.hwe1e-6.hg38.glm-dosage" > plot2.swarm
echo "Rscript ../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf005cases.hwe1e-6_ALLchr.txt ${CASES} ${CONTROLS} MG.ITALS.unrelated.Rsq03.maf005cases.hwe1e-6.hg38.glm-dosage" >> plot2.swarm

echo "Rscript ../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001.hwe1e-6_ALLchr.txt ${CASES} ${CONTROLS} MG.ITALS.unrelated.Rsq03.maf001.hwe1e-6.hg38.glm-dosage" >> plot2.swarm
echo "Rscript ../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf005.hwe1e-6_ALLchr.txt ${CASES} ${CONTROLS} MG.ITALS.unrelated.Rsq03.maf005.hwe1e-6.hg38.glm-dosage" >> plot2.swarm

swarm --file plot2.swarm --logdir swarmOE_plot --module R/3.5.2 --time 04:00:00 -g 120
9542004

Plots/Results (pre-validation)

filtered by maf (all) > 0.01

In [1]:
from IPython.display import display
from PIL import Image

print("Manhattan & QQ plots \nMG.ITALS, Rsq > 0.3, maf(all) > 0.01")
manhattan="MG.ITALS.unrelated.Rsq03.maf001.hg38.glm-dosage.manhattan_v0.jpeg"
display(Image.open(manhattan))

qq="MG.ITALS.unrelated.Rsq03.maf001.hg38.glm-dosage.QQ.jpeg"
display(Image.open(qq))
Manhattan & QQ plots 
MG.ITALS, Rsq > 0.3, maf(all) > 0.01
In [2]:
import pandas as pd
import numpy as np

mg = pd.read_csv("MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001_ALLchr.txt",sep="\t")
mgSignificant2 = mg[(mg['P'] <= 0.00000005) & (mg['#CHROM'] != 6)]
mgSignificant2 = mgSignificant2.sort_values(by=['#CHROM','POS'])
mgSignificant2.drop(['TEST','OBS_CT','Z_STAT','ERRCODE'],axis=1)
Out[2]:
#CHROM POS ID REF ALT A1 A1_CT ALLELE_CT A1_CASE_CT A1_CTRL_CT CASE_ALLELE_CT CTRL_ALLELE_CT A1_FREQ A1_CASE_FREQ A1_CTRL_FREQ FIRTH? OR LOG(OR)_SE P
2746803 5 28730630 chr5:28730630:T:A T A A 3100 6644 722 2378 1818 4826 0.466586 0.397140 0.492748 N 0.687028 0.064727 6.652700e-09
2746813 5 28733646 chr5:28733646:T:C T C C 3095 6644 722 2373 1818 4826 0.465834 0.397140 0.491712 N 0.680774 0.065788 5.068200e-09
2746814 5 28733802 chr5:28733802:G:A G A A 3066 6644 715 2351 1818 4826 0.461469 0.393289 0.487153 N 0.688650 0.065808 1.441680e-08
2746817 5 28735171 chr5:28735171:T:A T A A 3096 6644 723 2373 1818 4826 0.465984 0.397690 0.491712 N 0.683632 0.065756 7.292830e-09
2746825 5 28739470 chr5:28739470:A:C A C C 3099 6644 725 2374 1818 4826 0.466436 0.398790 0.491919 N 0.685426 0.065792 9.407880e-09
2746830 5 28741429 chr5:28741429:C:A C A A 3089 6644 721 2368 1818 4826 0.464931 0.396590 0.490676 N 0.682986 0.065820 6.922120e-09
2746832 5 28741633 chr5:28741633:T:C T C C 3089 6644 721 2368 1818 4826 0.464931 0.396590 0.490676 N 0.682986 0.065820 6.922120e-09
2746836 5 28743241 chr5:28743241:C:T C T T 3085 6644 719 2366 1818 4826 0.464329 0.395490 0.490261 N 0.679446 0.065876 4.444190e-09
2746845 5 28746208 chr5:28746208:A:G A G G 3092 6644 723 2369 1818 4826 0.465382 0.397690 0.490883 N 0.685037 0.065868 9.298490e-09
2746849 5 28746636 chr5:28746636:A:G A G G 3088 6644 721 2367 1818 4826 0.464780 0.396590 0.490468 N 0.681485 0.065923 5.989230e-09
2746850 5 28746934 chr5:28746934:GATAA:G GATAA G G 3087 6644 721 2366 1818 4826 0.464630 0.396590 0.490261 N 0.682051 0.065949 6.544010e-09
2746865 5 28754196 chr5:28754196:C:T C T T 3099 6644 725 2374 1818 4826 0.466436 0.398790 0.491919 N 0.685119 0.065783 8.995830e-09
2746867 5 28754812 chr5:28754812:C:T C T T 3099 6644 725 2374 1818 4826 0.466436 0.398790 0.491919 N 0.685119 0.065783 8.995830e-09
2746873 5 28755886 chr5:28755886:T:C T C C 3101 6644 726 2375 1818 4826 0.466737 0.399340 0.492126 N 0.686122 0.065774 1.021130e-08
In [20]:
import pandas as pd
import numpy as np

mg = pd.read_csv("MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001_ALLchr.txt",sep="\t")
mg[(mg['POS'] == 174764492) & (mg['#CHROM'] == 2)].drop(['TEST','OBS_CT','Z_STAT','ERRCODE'],axis=1)
Out[20]:
#CHROM POS ID REF ALT A1 A1_CT ALLELE_CT A1_CASE_CT A1_CTRL_CT CASE_ALLELE_CT CTRL_ALLELE_CT A1_FREQ A1_CASE_FREQ A1_CTRL_FREQ FIRTH? OR LOG(OR)_SE P
1190521 2 174764492 chr2:174764492:G:A G A A 334 6644 110 224 1818 4826 0.050271 0.060506 0.046415 N 1.42615 0.126374 0.00497

Prepare file for meta analysis

In [1]:
!cat ../scripts/toMetaprep.glm.updated.v2.R
#!/usr/bin/env Rscript

args <- commandArgs(trailingOnly=TRUE)
if (length(args) != 2) {
  stop("USAGE: Rscript toMetaprep.glm.updated.v2.R args[1] args[2]
  where args[1] = glm_results
        args[2] = OutputFileName")
}

require(data.table)
require(tidyverse)
data <- fread(args[1],header=T)

# Reformat
data$A2 <- ifelse(data$A1 == data$ALT, data$REF, data$ALT)
data$EffectAllele <- ifelse(data$A1_FREQ < 0.5, data$A1, data$A2)
data$OtherAllele <- ifelse(data$A1_FREQ < 0.5, data$A2, data$A1)
data$BETA <- ifelse(data$EffectAllele == data$A1, log(data$OR), log(data$OR)*-1)
data$SE <- data$`LOG(OR)_SE`

## freq
data$Freq_EffectAllele <- ifelse(data$EffectAllele == data$A1, data$A1_FREQ, 1-data$A1_FREQ)
data$Freq_EffectAllele.CASE <- ifelse(data$EffectAllele == data$A1, data$A1_CASE_FREQ, 1-data$A1_CASE_FREQ)
data$Freq_EffectAllele.CTRL <- ifelse(data$EffectAllele == data$A1, data$A1_CTRL_FREQ, 1-data$A1_CTRL_FREQ)

## counts
data$EffectAllele_CT <- ifelse(data$EffectAllele == data$A1, data$A1_CT, data$ALLELE_CT-data$A1_CT)
data$EffectAllele_CT.CASE <- ifelse(data$EffectAllele == data$A1, data$A1_CASE_CT, data$CASE_ALLELE_CT-data$A1_CASE_CT)
data$EffectAllele_CT.CTRL <- ifelse(data$EffectAllele == data$A1, data$A1_CTRL_CT, data$CTRL_ALLELE_CT-data$A1_CTRL_CT)

data$marker <- paste(data$`#CHROM`,data$POS,data$OtherAllele,data$EffectAllele,sep=":")
data1 <- data %>% rename(CHROM = `#CHROM`) %>% 
         select(CHROM,POS,ID,marker,EffectAllele,OtherAllele,
                OR,BETA,SE,P,Freq_EffectAllele,Freq_EffectAllele.CASE,Freq_EffectAllele.CTRL,
                ALLELE_CT,CASE_ALLELE_CT,CTRL_ALLELE_CT,
                EffectAllele_CT,EffectAllele_CT.CASE,EffectAllele_CT.CTRL) %>%
         filter(BETA > -5 & BETA < 5) %>%
         filter(!is.na(BETA))
         
dups <- data1 %>% select(marker,P) %>% filter(marker == duplicated(marker))
temp1 <- subset(data1, data1$marker %in% dups$marker) %>%
         group_by(marker) %>%
         filter(P == min(P)) %>%
         data.frame()
temp2 <- subset(data1, !(data1$marker %in% temp1$marker))
data2 <- rbind(temp1,temp2) %>% arrange(CHROM,POS)

write.table(data2, file=paste("toMeta.",args[2],".tab", sep=""), quote = F, sep = "\t", row.names = F)

In [2]:
%%bash
#module load R/3.5.2
echo "Rscript ../scripts/toMetaprep.glm.updated.v2.R MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03MAF00001_ALLchr.txt MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03MAF00001" > prep.swarm
echo "Rscript ../scripts/toMetaprep.glm.updated.v2.R MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03MAF00001.hwe1e-6_ALLchr.txt MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03MAF00001.hwe1e-6" >> prep.swarm
swarm --file prep.swarm --logdir swarmOE_prep -g 120 --module R/3.5.2 --gres=lscratch:500  --time 4:00:00
10633321

Run conditional analysis from top hits from meta analysis

In [1]:
%%bash
# View top hits from meta analysis
DIR="/data/NDRS_LNG/MyastheniaGravis/updated.April2020/MetaAnalysis.hg38.ByCohort.rerun/USmerged.Itals.hg38"
cat $DIR/IndexVarsSummStats.META_MG.USmerged.Itals.UNRELATED.hg38.Rsq03.filteredDirectionHetISq80MAF001cases.p5e-8.txt | column -t
CHROM  POS        MarkerName        Allele1  Allele2  Freq1   Effect   StdErr  Direction  P          HetISq
1      113761186  1:113761186:C:A   a        c        0.1154  0.3937   0.0613  ++         1.354e-10  0
1      113834946  1:113834946:G:A   a        g        0.1166  0.396    0.0609  ++         7.949e-11  0
2      174764492  2:174764492:G:A   a        g        0.057   0.4495   0.0812  ++         3.072e-08  0
6      31358835   6:31358835:A:G    a        g        0.628   -0.2602  0.039   --         2.554e-11  79.6
6      31358841   6:31358841:A:G    a        g        0.628   -0.2602  0.039   --         2.554e-11  79.6
6      32620936   6:32620936:T:C    t        c        0.9743  0.8661   0.1128  ++         1.576e-14  40.1
6      32625016   6:32625016:C:T    t        c        0.0257  -0.8661  0.1128  --         1.582e-14  40.2
8      108247378  8:108247378:T:C   t        c        0.9875  -1.0637  0.1935  --         3.859e-08  0
8      108248309  8:108248309:G:GA  g        ga       0.9875  -1.0637  0.1935  --         3.859e-08  0
10     7410781    10:7410781:A:G    a        g        0.7273  -0.2369  0.042   --         1.665e-08  46
11     95578258   11:95578258:T:C   t        c        0.7835  -0.2545  0.045   --         1.541e-08  0
18     62342581   18:62342581:T:C   t        c        0.4631  -0.2874  0.0369  --         7.085e-15  73.7
18     62343215   18:62343215:C:A   a        c        0.5364  0.287    0.0369  ++         7.692e-15  73.7

Get imputation info for each index variant/hit from Italian cohort. Need to make sure that the imputation quality are good across imputed cohorts.

In [2]:
%%bash
DATA="/data/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals"
head -n 1 $DATA/Imputation.hg38/chr1.info > topHits.ImputationInfo.Itals.txt
grep "chr1:113834946:" $DATA/Imputation.hg38/chr1.info >> topHits.ImputationInfo.Itals.txt
grep "chr2:174764492:" $DATA/Imputation.hg38/chr2.info >> topHits.ImputationInfo.Itals.txt
grep "chr6:32620936:" $DATA/Imputation.hg38/chr6.info >> topHits.ImputationInfo.Itals.txt
grep "chr8:108247378:" $DATA/Imputation.hg38/chr8.info >> topHits.ImputationInfo.Itals.txt
grep "chr10:7410781:" $DATA/Imputation.hg38/chr10.info >> topHits.ImputationInfo.Itals.txt
grep "chr11:95578258:" $DATA/Imputation.hg38/chr11.info >> topHits.ImputationInfo.Itals.txt
grep "chr18:62355088:" $DATA/Imputation.hg38/chr18.info >> topHits.ImputationInfo.Itals.txt
In [3]:
import pandas as pd
pd.read_csv("topHits.ImputationInfo.Itals.txt",sep="\t")
Out[3]:
SNP REF(0) ALT(1) ALT_Frq MAF AvgCall Rsq Genotyped LooRsq EmpR EmpRsq Dose0 Dose1
0 chr1:113834946:A:G A G 0.94778 0.05222 0.98876 0.80838 Imputed - - - - -
1 chr2:174764492:G:A G A 0.04831 0.04831 0.99567 0.92378 Imputed - - - - -
2 chr6:32620936:T:C T C 0.03545 0.03545 0.99940 0.98518 Imputed - - - - -
3 chr8:108247378:T:C T C 0.00934 0.00934 0.99971 0.97632 Imputed - - - - -
4 chr10:7410781:A:G A G 0.26173 0.26173 0.98642 0.94508 Imputed - - - - -
5 chr11:95578258:T:C T C 0.19086 0.19086 0.99981 0.99910 Genotyped 0.981 0.994 0.98804 0.99064 0.00364
6 chr18:62355088:C:T C T 0.60668 0.39332 0.99559 0.98300 Imputed - - - - -

On single top index variant

List of single index variants per loci to condition on:

  • chr1:113834946:A:G
  • chr2:174764492:G:A
  • chr6:32620936:T:C
  • chr8:108247378:T:C
  • chr10:7410781:A:G
  • chr11:95578258:T:C
  • chr18:62355088:C:T

Notes:

  1. conditioning on chr6:32620936:T:C revealed there was a second independent peak in the same locus. The top hit here is chr6:31358836:G:A.
  2. Run additional conditional analysis on:
  • chr6:31358836:G:A
  • chr6:31358836:G:A,chr6:32620936:T:C
In [4]:
%%bash
# create list of snps to condition on 
echo "chr1:113834946:A:G" > IndexVars.ConditionList.txt
echo "chr2:174764492:G:A" >> IndexVars.ConditionList.txt
echo "chr6:32620936:T:C" >> IndexVars.ConditionList.txt
echo "chr8:108247378:T:C" >> IndexVars.ConditionList.txt
echo "chr10:7410781:A:G" >> IndexVars.ConditionList.txt
echo "chr11:95578258:T:C" >> IndexVars.ConditionList.txt
echo "chr18:62355088:C:T" >> IndexVars.ConditionList.txt

cat IndexVars.ConditionList.txt
chr1:113834946:A:G
chr2:174764492:G:A
chr6:32620936:T:C
chr8:108247378:T:C
chr10:7410781:A:G
chr11:95578258:T:C
chr18:62355088:C:T
In [16]:
%%bash
mkdir ConditionalAnalysis

DATA="/data/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals"
COVAR="GENDER,age_at_onset,PC1,PC2,PC3,PC4,PC5,PC6,PC10"

for CHNUM in {1,2,6,8,10,11,18};
  do
echo "plink \
--vcf $DATA/Imputation.hg38/chr${CHNUM}.dose.vcf.gz \
--extract $DATA/Imputation.hg38/Vars.Rsq03MAF00001/Rsq03MAF00001.chr${CHNUM}.txt \
--double-id \
--pheno-name PHENO \
--pheno ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt \
--covar ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt \
--glm hide-covar firth-fallback cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err \
--out ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr${CHNUM} \
--condition-list IndexVars.ConditionList.txt \
--covar-name $COVAR \
--covar-variance-standardize" >> GWAS.condition.swarm
done
In [17]:
%%bash
swarm --file GWAS.condition.swarm \
--logdir swarmOE_GWAS.MG \
--gres=lscratch:800 \
-t 32 --partition quick -g 120 --time 04:00:00 \
--module plink/2.0-dev-20191128 \
--sbatch '--constraint=ibfdr'
10209559

Run additional conditional analysis for a second independent signal on chr6

In [45]:
%%bash

DATA="/data/NDRS_LNG/MyastheniaGravis/updated.April2020/Itals"
COVAR="GENDER,age_at_onset,PC1,PC2,PC3,PC4,PC5,PC6,PC10"

for CHNUM in 6;
  do
echo "plink \
--vcf $DATA/Imputation.hg38/chr${CHNUM}.dose.vcf.gz \
--extract $DATA/Imputation.hg38/Vars.Rsq03MAF00001/Rsq03MAF00001.chr${CHNUM}.txt \
--double-id \
--pheno-name PHENO \
--pheno ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt \
--covar ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt \
--glm hide-covar firth-fallback cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err \
--out ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.conditioned.glm.chr6_31358836_A_G_chr${CHNUM} \
--condition chr6:31358836:A:G \
--covar-name $COVAR \
--covar-variance-standardize" > GWAS.condition.2.swarm
done

echo "chr6:31358836:A:G" > IndexVars.ConditionList.2.txt
echo "chr6:32620936:T:C" >> IndexVars.ConditionList.2.txt

for CHNUM in 6;
  do
echo "plink \
--vcf $DATA/Imputation.hg38/chr${CHNUM}.dose.vcf.gz \
--extract $DATA/Imputation.hg38/Vars.Rsq03MAF00001/Rsq03MAF00001.chr${CHNUM}.txt \
--double-id \
--pheno-name PHENO \
--pheno ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt \
--covar ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt \
--glm hide-covar firth-fallback cols=+a1freq,+a1freqcc,+a1count,+totallele,+a1countcc,+totallelecc,+err \
--out ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.conditioned.glm.chr6_31358836_A_G.chr6_32620936_T_C_chr${CHNUM} \
--condition-list IndexVars.ConditionList.2.txt \
--covar-name $COVAR \
--covar-variance-standardize" >> GWAS.condition.2.swarm
done
In [46]:
%%bash
swarm --file GWAS.condition.2.swarm \
--logdir swarmOE_GWAS.MG \
--gres=lscratch:800 \
-t 32 --partition quick -g 120 --time 04:00:00 \
--module plink/2.0-dev-20191128 \
--sbatch '--constraint=ibfdr'
10268531

Merge files

In [22]:
%%bash
cd ConditionalAnalysis

for Variant in chr1_113834946_A_G;
do 
awk 'NR==1;NR>1 {if($22 != "NA" && $14 >= 0.01) print}' MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr1.PHENO.glm.logistic.hybrid > MG.Itals.noDups.UNRELATED.hg38.glm.condition.${Variant}.chr1.maf001cases.txt
done

for Variant in chr2_174764492_G_A;
do 
awk 'NR==1;NR>1 {if($22 != "NA" && $14 >= 0.01) print}' MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr2.PHENO.glm.logistic.hybrid > MG.Itals.noDups.UNRELATED.hg38.glm.condition.${Variant}.chr2.maf001cases.txt
done

for Variant in chr6_32620936_T_C;
do 
awk 'NR==1;NR>1 {if($22 != "NA" && $14 >= 0.01) print}' MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr6.PHENO.glm.logistic.hybrid > MG.Itals.noDups.UNRELATED.hg38.glm.condition.${Variant}.chr6.maf001cases.txt
done

for Variant in chr8_108247378_T_C;
do 
awk 'NR==1;NR>1 {if($22 != "NA" && $14 >= 0.01) print}' MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr8.PHENO.glm.logistic.hybrid > MG.Itals.noDups.UNRELATED.hg38.glm.condition.${Variant}.chr8.maf001cases.txt
done

for Variant in chr10_7410781_A_G;
do 
awk 'NR==1;NR>1 {if($22 != "NA" && $14 >= 0.01) print}' MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr10.PHENO.glm.logistic.hybrid > MG.Itals.noDups.UNRELATED.hg38.glm.condition.${Variant}.chr10.maf001cases.txt
done

for Variant in chr11_95578258_T_C;
do 
awk 'NR==1;NR>1 {if($22 != "NA" && $14 >= 0.01) print}' MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr11.PHENO.glm.logistic.hybrid > MG.Itals.noDups.UNRELATED.hg38.glm.condition.${Variant}.chr11.maf001cases.txt
done

for Variant in chr18_62355088_C_T;
do 
awk 'NR==1;NR>1 {if($22 != "NA" && $14 >= 0.01) print}' MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr18.PHENO.glm.logistic.hybrid > MG.Itals.noDups.UNRELATED.hg38.glm.condition.${Variant}.chr18.maf001cases.txt
done
In [49]:
%%bash
cd ConditionalAnalysis

for Variant in {chr6_31358836_A_G,chr6_31358836_A_G.chr6_32620936_T_C};
do 
awk 'NR==1;NR>1 {if($22 != "NA" && $14 >= 0.01) print}' MG.Itals.noDups.UNRELATED.hg38.conditioned.glm.${Variant}_chr6.PHENO.glm.logistic.hybrid > MG.Itals.noDups.UNRELATED.hg38.glm.condition.${Variant}.chr6.maf001cases.txt
done
In [50]:
%%bash
cd ConditionalAnalysis
## Tidy up directory
mkdir toArchive
mv MG.Itals.noDups.UNRELATED.hg38.conditioned.glm*_chr*.PHENO.glm.logistic.hybrid ./toArchive
mv MG.Itals.noDups.UNRELATED.hg38.conditioned.glm*_chr*.log ./toArchive

Plot conditioned manhattan (per loci/chromosome)

In [24]:
%%bash
awk 'NR==1;NR>1 {if($8 != "NA") print}' ../COVARIATES.Itals_mg_noDups.UNRELATED.forGLM.txt | awk 'NR==1;NR>1 {if($20 != "NA") print}' | cut -f6 | sort | uniq -c
   2413 1
    909 2
      1 PHENO
In [25]:
%%bash
cd ConditionalAnalysis

# Plot chromosomes without conditional analysis for comparison
# First subset chromosome from summ stats
for CHNUM in {1,2,6,8,10,11,18}
do
awk "NR==1;NR>1 {if(\$1 == ${CHNUM}) print}" ../MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases_ALLchr.txt > summ.stats.unconditioned.chr${CHNUM}.txt
done
In [26]:
%%bash
cd ConditionalAnalysis

cases="909"
controls="2413"

echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R summ.stats.unconditioned.chr1.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr1" > plot1.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R summ.stats.unconditioned.chr2.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr2" >> plot1.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R summ.stats.unconditioned.chr6.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr6" >> plot1.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R summ.stats.unconditioned.chr8.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr8" >> plot1.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R summ.stats.unconditioned.chr10.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr10" >> plot1.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R summ.stats.unconditioned.chr11.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr11" >> plot1.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R summ.stats.unconditioned.chr18.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr18" >> plot1.swarm

swarm --file plot1.swarm --logdir swarmOE_plot --module R/3.5.2 -g 120 --time 02:00:00 --partition quick
10212187
In [27]:
%%bash
cd ConditionalAnalysis

cases="909"
controls="2413"

echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr1_113834946_A_G.chr1.maf001cases.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr1_113834946_A_G.chr1.maf001cases" > plot2.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr2_174764492_G_A.chr2.maf001cases.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr2_174764492_G_A.chr2.maf001cases" >> plot2.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr6_32620936_T_C.chr6.maf001cases.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr6_32620936_T_C.chr6.maf001cases" >> plot2.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr8_108247378_T_C.chr8.maf001cases.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr8_108247378_T_C.chr8.maf001cases" >> plot2.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr10_7410781_A_G.chr10.maf001cases.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr10_7410781_A_G.chr10.maf001cases" >> plot2.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr11_95578258_T_C.chr11.maf001cases.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr11_95578258_T_C.chr11.maf001cases" >> plot2.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr18_62355088_C_T.chr18.maf001cases.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr18_62355088_C_T.chr18.maf001cases" >> plot2.swarm

swarm --file plot2.swarm --logdir swarmOE_plot --module R/3.5.2 -g 120 --time 02:00:00 --partition quick
10212213
In [56]:
%%bash
cd ConditionalAnalysis

cases="909"
controls="2413"

echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr6_31358836_A_G.chr6.maf001cases.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr6_31358836_A_G.chr6.maf001cases" > plot3.swarm
echo "Rscript ../../scripts/QQ_manhattan_plots_GLMresults.FromImputed.wgs.R MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr6_31358836_A_G.chr6_32620936_T_C.chr6.maf001cases.txt $cases $controls MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr6_31358836_A_G.chr6_32620936_T_C.chr6.maf001cases" >> plot3.swarm

swarm --file plot3.swarm --logdir swarmOE_plot --module R/3.5.2 -g 120 --time 02:00:00 --partition quick
10298306

chr1_113834946_A_G

In [30]:
from IPython.display import display
from PIL import Image

uncond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr1.manhattan_v1.jpeg"
display(Image.open(uncond))

cond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr1_113834946_A_G.chr1.maf001cases.manhattan_v1.jpeg"
display(Image.open(cond))

chr2_174764492_G_A

In [31]:
from IPython.display import display
from PIL import Image

uncond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr2.manhattan_v1.jpeg"
display(Image.open(uncond))

cond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr2_174764492_G_A.chr2.maf001cases.manhattan_v1.jpeg"
display(Image.open(cond))

chr6_32620936_T_C

In [3]:
from IPython.display import display
from PIL import Image

uncond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr6.manhattan_v0.jpeg"
display(Image.open(uncond))

cond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr6_32620936_T_C.chr6.maf001cases.manhattan_v0.jpeg"
display(Image.open(cond))

chr6_31358836_A_G

In [59]:
from IPython.display import display
from PIL import Image

cond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr6_31358836_A_G.chr6.maf001cases.manhattan_v1.jpeg"
display(Image.open(cond))

chr6_31358836_A_G.chr6_32620936_T_C

In [60]:
from IPython.display import display
from PIL import Image

cond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr6_31358836_A_G.chr6_32620936_T_C.chr6.maf001cases.manhattan_v1.jpeg"
display(Image.open(cond))

chr8_108247378_T_C

In [33]:
from IPython.display import display
from PIL import Image

uncond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr8.manhattan_v1.jpeg"
display(Image.open(uncond))

cond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr8_108247378_T_C.chr8.maf001cases.manhattan_v1.jpeg"
display(Image.open(cond))

chr10_7410781_A_G

In [34]:
from IPython.display import display
from PIL import Image

uncond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr10.manhattan_v1.jpeg"
display(Image.open(uncond))

cond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr10_7410781_A_G.chr10.maf001cases.manhattan_v1.jpeg"
display(Image.open(cond))

chr11_95578258_T_C

In [35]:
from IPython.display import display
from PIL import Image

uncond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr11.manhattan_v1.jpeg"
display(Image.open(uncond))

cond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr11_95578258_T_C.chr11.maf001cases.manhattan_v1.jpeg"
display(Image.open(cond))

chr18_62355088_C_T

In [36]:
from IPython.display import display
from PIL import Image

uncond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.Rsq03.maf001cases.unconditioned.chr18.manhattan_v1.jpeg"
display(Image.open(uncond))

cond="ConditionalAnalysis/MG.Itals.noDups.UNRELATED.hg38.glm.condition.chr18_62355088_C_T.chr18.maf001cases.manhattan_v1.jpeg"
display(Image.open(cond))

Plot genome-wide manhattan and QQ (with summ stats from conditional analysis) for meta analysis

Merge conditional analysis with non-conditional summary stats

Notes:

  1. Replace results from chr1,2,6,8,10,11 and 18 from non-conditional summary stats with summ stats from conditional analysis --> plot QQ and Manhattan
  2. Where there are multiple significant loci per chromosome, the index variant with the smallest p-value is used for conditional analysis.

Index variants:

  • chr1_113834946_A_G
  • chr2_174764492_G_A
  • chr6_32620936_T_C
  • chr8_108247378_T_C
  • chr10_7410781_A_G
  • chr11_95578258_T_C
  • chr18_62355088_C_T
In [37]:
%%bash
cd ConditionalAnalysis

head -n 1 ./toArchive/MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr1.PHENO.glm.logistic.hybrid > MG.Itals.noDups.UNRELATED.hg38.glm.condition.tophits.txt 

tail -n +2 ./toArchive/MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr1.PHENO.glm.logistic.hybrid >> MG.Itals.noDups.UNRELATED.hg38.glm.condition.tophits.txt
tail -n +2 ./toArchive/MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr2.PHENO.glm.logistic.hybrid >> MG.Itals.noDups.UNRELATED.hg38.glm.condition.tophits.txt
tail -n +2 ./toArchive/MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr6.PHENO.glm.logistic.hybrid >> MG.Itals.noDups.UNRELATED.hg38.glm.condition.tophits.txt
tail -n +2 ./toArchive/MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr8.PHENO.glm.logistic.hybrid >> MG.Itals.noDups.UNRELATED.hg38.glm.condition.tophits.txt
tail -n +2 ./toArchive/MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr10.PHENO.glm.logistic.hybrid >> MG.Itals.noDups.UNRELATED.hg38.glm.condition.tophits.txt
tail -n +2 ./toArchive/MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr11.PHENO.glm.logistic.hybrid >> MG.Itals.noDups.UNRELATED.hg38.glm.condition.tophits.txt
tail -n +2 ./toArchive/MG.Itals.noDups.UNRELATED.hg38.conditioned.glm_chr18.PHENO.glm.logistic.hybrid >> MG.Itals.noDups.UNRELATED.hg38.glm.condition.tophits.txt
In [38]:
%%bash
cd ConditionalAnalysis

module load R/3.5.2
R --vanilla --no-save

# Load libraries
require(data.table)
require(tidyverse)

# Read in data
summ <- fread("../MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03MAF00001_ALLchr.txt",header=T)
summ1 <- summ %>% filter(`#CHROM` != "1" &
                         `#CHROM` != "2" & 
                         `#CHROM` != "6" & 
                         `#CHROM` != "8" & 
                         `#CHROM` != "10" & 
                         `#CHROM` != "11" & 
                         `#CHROM` != "18")
cond <- fread("MG.Itals.noDups.UNRELATED.hg38.glm.condition.tophits.txt",header=T)
both <- rbind(summ1,cond) %>% arrange(`#CHROM`,POS)
write.table(both,"MG.Itals.noDups.UNRELATED.hg38.MERGEDconditional.txt",quote=F,sep="\t",col.names=T,row.names=F)
R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
Copyright (C) 2018 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> 
> # Load libraries
> require(data.table)
> require(tidyverse)
> 
> # Read in data
> summ <- fread("../MG.ITALS.noDups.UNRELATED.hg38.glm.Rsq03MAF00001_ALLchr.txt",header=T)
> summ1 <- summ %>% filter(`#CHROM` != "1" &
+                          `#CHROM` != "2" & 
+                          `#CHROM` != "6" & 
+                          `#CHROM` != "8" & 
+                          `#CHROM` != "10" & 
+                          `#CHROM` != "11" & 
+                          `#CHROM` != "18")
> cond <- fread("MG.Itals.noDups.UNRELATED.hg38.glm.condition.tophits.txt",header=T)
> both <- rbind(summ1,cond) %>% arrange(`#CHROM`,POS)
> write.table(both,"MG.Itals.noDups.UNRELATED.hg38.MERGEDconditional.txt",quote=F,sep="\t",col.names=T,row.names=F)
> 
[-] Unloading gcc  9.2.0  ... 
[-] Unloading GSL 2.6 for GCC 9.2.0 ... 
[-] Unloading openmpi 3.1.4  for GCC 9.2.0 
[-] Unloading ImageMagick  7.0.8  on cn3524 
[-] Unloading HDF5  1.10.4 
[-] Unloading NetCDF 4.7.4_gcc9.2.0 
[-] Unloading pandoc  2.11.4  on cn3524 
[-] Unloading pcre2 10.21  ... 
[-] Unloading R 4.0.3 
[+] Loading gcc  7.3.0  ... 
[+] Loading GSL 2.4 for GCC 7.2.0 ... 
[-] Unloading gcc  7.3.0  ... 
[+] Loading gcc  7.3.0  ... 
[+] Loading openmpi 3.0.2  for GCC 7.3.0 
[+] Loading ImageMagick  7.0.8  on cn3524 
[+] Loading HDF5  1.10.4 
[+] Loading pandoc  2.11.4  on cn3524 
[+] Loading R 3.5.2 

The following have been reloaded with a version change:
  1) GSL/2.6_gcc-9.2.0 => GSL/2.4_gcc-7.2.0     3) gcc/9.2.0 => gcc/7.3.0
  2) R/4.0 => R/3.5.2

Loading required package: data.table
Loading required package: tidyverse
-- Attaching packages --------------------------------------- tidyverse 1.2.1 --
v ggplot2 3.3.2     v purrr   0.3.4
v tibble  3.0.3     v dplyr   0.8.5
v tidyr   0.8.3     v stringr 1.4.0
v readr   1.3.1     v forcats 0.5.0
-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::between()   masks data.table::between()
x dplyr::filter()    masks stats::filter()
x dplyr::first()     masks data.table::first()
x dplyr::lag()       masks stats::lag()
x dplyr::last()      masks data.table::last()
x purrr::transpose() masks data.table::transpose()

Prepare file for conditional meta analysis

In [4]:
%%bash
cd ConditionalAnalysis
#module load R/3.5.2
echo "Rscript ../../scripts/toMetaprep.glm.updated.v2.R MG.Itals.noDups.UNRELATED.hg38.MERGEDconditional.txt MG.Itals.noDups.UNRELATED.hg38.MERGEDconditional" > prep2.swarm
swarm --file prep2.swarm --logdir swarmOE_prep -g 120 --module R/3.5.2 --gres=lscratch:500  --time 4:00:00
10633346
In [5]:
%%bash
# for the additional independentt signal on chr 6
cd ConditionalAnalysis

module load R/3.5.2
for Variant in {chr6_31358836_A_G,chr6_31358836_A_G.chr6_32620936_T_C};
do
echo "Rscript ../../scripts/toMetaprep.glm.updated.v2.R MG.Itals.noDups.UNRELATED.hg38.glm.condition.${Variant}.chr6.maf001cases.txt MG.Itals.noDups.UNRELATED.hg38.glm.condition.${Variant}.chr6.maf001cases" >> prep3.swarm
done
swarm --file prep3.swarm --logdir swarmOE_prep -g 120 --module R/3.5.2 --gres=lscratch:500  --time 4:00:00
10633367
[-] Unloading gcc  9.2.0  ... 
[-] Unloading GSL 2.6 for GCC 9.2.0 ... 
[-] Unloading openmpi 3.1.4  for GCC 9.2.0 
[-] Unloading ImageMagick  7.0.8  on cn2969 
[-] Unloading HDF5  1.10.4 
[-] Unloading NetCDF 4.7.4_gcc9.2.0 
[-] Unloading pandoc  2.11.4  on cn2969 
[-] Unloading pcre2 10.21  ... 
[-] Unloading R 4.0.3 
[+] Loading gcc  7.3.0  ... 
[+] Loading GSL 2.4 for GCC 7.2.0 ... 
[-] Unloading gcc  7.3.0  ... 
[+] Loading gcc  7.3.0  ... 
[+] Loading openmpi 3.0.2  for GCC 7.3.0 
[+] Loading ImageMagick  7.0.8  on cn2969 
[+] Loading HDF5  1.10.4 
[+] Loading pandoc  2.11.4  on cn2969 
[+] Loading R 3.5.2 

The following have been reloaded with a version change:
  1) GSL/2.6_gcc-9.2.0 => GSL/2.4_gcc-7.2.0     3) gcc/9.2.0 => gcc/7.3.0
  2) R/4.0 => R/3.5.2

In [ ]: